{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## 感知机 perceptron\n",
    "\n",
    "感知机所要处理的问题为数据二分类的问题， 模型要判断这个数据属于两个事先规定好的类别中的哪一个\n",
    "\n",
    "例如逻辑与操作，(0, 0)、(0, 1)与(1, 0)点对应的结果为0，(1, 1)点对应的结果为1。\n",
    "\n",
    "机器对这4个点的处理方法为找到一条线，这样的话点到这条线的偏移就会有正负之分， y=-x+1.5\n",
    "\n",
    "优点： 模型简单，易于实现。\n",
    "\n",
    "缺点：\n",
    "\n",
    "  ① 无法完美地处理线性不可分地训练数据\n",
    "\n",
    "  ② 最终迭代代数受结果超平面以及训练集的数据影响很大\n",
    "\n",
    "  ③ 损失函数的目标只是减小所有误分类点与超平面，最终很有可能会导致部分样本点距离超平面很近，从某种程度上说这样的分类效果并不是特别的好，这个问题会在支持向量机中得到很好的解决。\n",
    "\n",
    "[感知机](https://zhuanlan.zhihu.com/p/141512375)"
   ]
  },
  {
   "cell_type": "markdown",
   "source": [
    "#### 基于sklearn的感知机 实现鸢尾花分类\n",
    "\n",
    "鸢尾花的数据集: 通过四种特征(花瓣长度,花瓣宽度,花萼长度,花萼宽度)来实现三种鸢尾花的类别划分"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "outputs": [],
   "source": [
    "import sklearn\n",
    "from sklearn import datasets\n",
    "import numpy as np\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.metrics import accuracy_score"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "outputs": [],
   "source": [
    "iris = datasets.load_iris() # 载入数据\n",
    "X = iris.data[:,[2,3]] # 特征选择，只选后2个特征\n",
    "y= iris.target  # 获取标签\n",
    "#数据划分\n",
    "X_train,X_test,y_train,y_test = train_test_split(X, y, test_size = 0.3, random_state = 1)\n"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "均值 Mean: [0.00000000e+00 3.03637713e-01 5.17074981e+00 1.19101707e+01\n",
      " 1.18908686e+01 5.82553823e+00 1.33556050e+00 1.31403118e-01\n",
      " 5.93912398e-03 1.93986637e+00 1.04083148e+01 1.19792131e+01\n",
      " 1.01536748e+01 8.18188567e+00 1.87602079e+00 1.21009651e-01\n",
      " 1.48478099e-03 2.58945805e+00 1.00319228e+01 6.92650334e+00\n",
      " 6.88047513e+00 7.79881218e+00 1.79435783e+00 5.64216778e-02\n",
      " 0.00000000e+00 2.48181143e+00 9.10170750e+00 8.74164811e+00\n",
      " 9.88418708e+00 7.56792873e+00 2.29101707e+00 2.96956199e-03\n",
      " 0.00000000e+00 2.29992576e+00 7.59985152e+00 9.08760208e+00\n",
      " 1.03808463e+01 8.75649592e+00 2.82553823e+00 0.00000000e+00\n",
      " 1.11358575e-02 1.53006682e+00 6.77134373e+00 7.10319228e+00\n",
      " 7.68151448e+00 8.31700074e+00 3.43578322e+00 3.41499629e-02\n",
      " 9.65107647e-03 7.15664439e-01 7.52932442e+00 9.41722346e+00\n",
      " 9.33778768e+00 8.84038604e+00 3.74461767e+00 1.95991091e-01\n",
      " 7.42390497e-04 2.81365999e-01 5.52041574e+00 1.21314031e+01\n",
      " 1.18871566e+01 6.86562732e+00 2.09651076e+00 3.49665924e-01]\n",
      "标准差 Scale [1.         0.91299786 4.74914344 4.18982848 4.23747183 5.65055144\n",
      " 3.25509442 1.08978576 0.10177521 3.16745341 5.37802085 4.00235806\n",
      " 4.83546577 6.05643087 3.6118562  0.90173281 0.03850424 3.54815952\n",
      " 5.66482965 5.77091264 6.15475725 6.2704737  3.27030657 0.47170256\n",
      " 1.         3.15784976 6.17482652 5.87951874 6.09415929 5.93171578\n",
      " 3.69191429 0.05441272 1.         3.47681151 6.3265106  6.2837899\n",
      " 5.85430798 5.87194016 3.51663788 1.         0.16536155 2.9707223\n",
      " 6.51385217 6.44470837 6.24044152 5.69932339 4.32591981 0.35150446\n",
      " 0.23576714 1.77169435 5.61758241 5.26587256 5.31273677 5.99979479\n",
      " 4.92528802 0.90706145 0.02723673 0.920912   5.09684886 4.35657706\n",
      " 4.86208059 5.89796176 4.09496165 1.80946685]\n"
     ]
    }
   ],
   "source": [
    "# 特征处理 - 标准化处理\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "sc = StandardScaler()\n",
    "sc.fit(X_train) # 估算每个特征的平均值和标准差\n",
    "print('均值 Mean:', sc.mean_)\n",
    "print('标准差 Scale', sc.scale_)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "outputs": [],
   "source": [
    "X_train_std = sc.transform(X_train)  # 对数据做标准化\n",
    "\n",
    "X_test_std = sc.transform(X_test)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "outputs": [
    {
     "data": {
      "text/plain": "0.92"
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 训练感知机模型\n",
    "from sklearn.linear_model import Perceptron\n",
    "\n",
    "# n_iter：可以理解成梯度下降中迭代的次数\n",
    "# eta0：可以理解成梯度下降中的学习率\n",
    "# random_state：设置随机种子的，为了每次迭代都有相同的训练集顺序\n",
    "ppn = Perceptron(eta0=0.1, random_state=1)\n",
    "ppn.fit(X_train_std, y_train)\n",
    "y_pred = ppn.predict(X_test_std)\n",
    "accuracy_score(y_test, y_pred)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### 基于sklearn-MLP多层感知机实例\n",
    "\n",
    "> 基于手写MNIST数据集来进行MLP实例\n",
    "\n",
    "#### sklearn.neural_network.MLPClassifier 参数\n",
    "\n",
    " * hidden_layer_sizes :元祖格式，长度=n_layers-2, 默认(100，），第i个元素表示第i个隐藏层的神经元的个数,如(100,100)表示两层，每层100个神经元。\n",
    "\n",
    " * activation :{‘identity’, ‘logistic’, ‘tanh’, ‘relu’}, 默认‘relu\n",
    "    - ‘identity’： no-op activation, useful to implement linear bottleneck，\n",
    "    返回f(x) = x\n",
    "    - ‘logistic’：the logistic sigmoid function, returns f(x) = 1 / (1 + exp(-x)).\n",
    "    - ‘tanh’：the hyperbolic tan function, returns f(x) = tanh(x).\n",
    "    - ‘relu’：the rectified linear unit function, returns f(x) = max(0, x)\n",
    "\n",
    " * solver： {‘lbfgs’, ‘sgd’, ‘adam’}, 默认 ‘adam’，用来优化权重\n",
    "    - lbfgs：quasi-Newton方法的优化器\n",
    "    - sgd：随机梯度下降\n",
    "    - adam： Kingma, Diederik, and Jimmy Ba提出的机遇随机梯度的优化器\n",
    "  > 注意：默认solver ‘adam’在相对较大的数据集上效果比较好（几千个样本或者更多），对小数据集来说，lbfgs收敛更快效果也更好。\n",
    "\n",
    " * alpha :float,可选的，默认0.0001,正则化项参数\n",
    "\n",
    " * batch_size : int , 可选的，默认‘auto’,随机优化的minibatches的大小，如果solver是‘lbfgs’，分类器将不使用minibatch，当设置成‘auto’，batch_size=min(200,n_samples)\n",
    "\n",
    " * learning_rate :{‘constant’，‘invscaling’, ‘adaptive’},默认‘constant’，用于权重更新，只有当solver为’sgd‘时使用\n",
    "    - ‘constant’: 有‘learning_rate_init’给定的恒定学习率\n",
    "    - ‘incscaling’：随着时间t使用’power_t’的逆标度指数不断降低学习率learning_rate_ ，effective_learning_rate = learning_rate_init / pow(t, power_t)\n",
    "    - ‘adaptive’：只要训练损耗在下降，就保持学习率为’learning_rate_init’不变，当连续两次不能降低训练损耗或验证分数停止升高至少tol时，将当前学习率除以5.\n",
    "\n",
    "max_iter: int，可选，默认200，最大迭代次数。\n",
    "\n",
    "random_state:int 或RandomState，可选，默认None，随机数生成器的状态或种子。\n",
    "\n",
    "shuffle: bool，可选，默认True,只有当solver=’sgd’或者‘adam’时使用，判断是否在每次迭代时对样本进行清洗。\n",
    "\n",
    "tol：float, 可选，默认1e-4，优化的容忍度\n",
    "\n",
    "learning_rate_int:double,可选，默认0.001，初始学习率，控制更新权重的补偿，只有当solver=’sgd’ 或’adam’时使用。\n",
    "\n",
    "power_t: double, optional, default 0.5，只有solver=’sgd’时使用，是逆扩展学习率的指数.当learning_rate=’invscaling’，用来更新有效学习率。\n",
    "\n",
    "verbose : bool, optional, default False,是否将过程打印到stdout\n",
    "\n",
    "warm_start : bool, optional, default False,当设置成True，使用之前的解决方法作为初始拟合，否则释放之前的解决方法。\n",
    "\n",
    "momentum : float, default 0.9,Momentum(动量） for gradient descent update. Should be between 0 and 1. Only used when solver=’sgd’.\n",
    "\n",
    "nesterovs_momentum : boolean, default True, Whether to use Nesterov’s momentum. Only used when solver=’sgd’ and momentum > 0.\n",
    "\n",
    "early_stopping : bool, default False,Only effective when solver=’sgd’ or ‘adam’,判断当验证效果不再改善的时候是否终止训练，当为True时，自动选出10%的训练数据用于验证并在两步连续爹迭代改善低于tol时终止训练。\n",
    "\n",
    "validation_fraction : float, optional, default 0.1,用作早期停止验证的预留训练数据集的比例，早0-1之间，只当early_stopping=True有用\n",
    "\n",
    " * beta_1 : float, optional, default 0.9，Only used when solver=’adam’，估计一阶矩向量的指数衰减速率，[0,1)之间\n",
    "\n",
    " * beta_2 : float, optional, default 0.999,Only used when solver=’adam’估计二阶矩向量的指数衰减速率[0,1)之间\n",
    "\n",
    " * epsilon : float, optional, default 1e-8,Only used when solver=’adam’数值稳定值。\n",
    "属性说明：\n",
    "    - classes_:每个输出的类标签\n",
    "    - loss_:损失函数计算出来的当前损失值\n",
    "    - coefs_:列表中的第i个元素表示i层的权重矩阵\n",
    "    - intercepts_:列表中第i个元素代表i+1层的偏差向量\n",
    "    - n_iter_ ：迭代次数\n",
    "    - n_layers_:层数\n",
    "    - n_outputs_:输出的个数\n",
    "    - out_activation_:输出激活函数的名称。\n",
    "\n",
    "#### 方法说明：\n",
    "\n",
    "  * fit(X,y):拟合\n",
    "  * get_params([deep]):获取参数\n",
    "  * predict(X):使用MLP进行预测\n",
    "  * predic_log_proba(X):返回对数概率估计\n",
    "  * predic_proba(X)：概率估计\n",
    "  * score(X,y[,sample_weight]):返回给定测试数据和标签上的平均准确度\n",
    "  * set_params(**params):设置参数。"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "准确率： 0.9271092541008976\n"
     ]
    }
   ],
   "source": [
    "from sklearn.datasets import load_digits\n",
    "from sklearn.model_selection import train_test_split, cross_val_score\n",
    "from sklearn.neural_network import MLPClassifier\n",
    "\n",
    "digits = load_digits()\n",
    "X = digits.data\n",
    "y = digits.target\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=3)\n",
    "cls = MLPClassifier(activation='relu', alpha=1e-05, batch_size='auto', beta_1=0.9,\n",
    "       beta_2=0.999, early_stopping=False, epsilon=1e-08,\n",
    "       hidden_layer_sizes=(100, 100), learning_rate='constant',\n",
    "       learning_rate_init=0.001, max_iter=200, momentum=0.9,\n",
    "       nesterovs_momentum=True, power_t=0.5, random_state=1, shuffle=True,\n",
    "       solver='lbfgs', tol=0.0001, validation_fraction=0.1, verbose=False,\n",
    "       warm_start=False)\n",
    "print('准确率： %s' % cross_val_score(cls, X, y, cv=5).mean())"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "用Tensor构建一个两层神经网络, 如下步骤\n",
    "\n",
    "1、构建好网络模型\n",
    "\n",
    "2、参数初始化\n",
    "\n",
    "3、前向传播\n",
    "\n",
    "4、计算损失\n",
    "\n",
    "5、反向传播求出梯度\n",
    "\n",
    "6、更新权重"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor(14531.6719)\n",
      "tensor(1191.9253)\n",
      "tensor(238.1494)\n",
      "tensor(68.8115)\n",
      "tensor(22.7645)\n",
      "tensor(7.9490)\n",
      "tensor(2.8519)\n",
      "tensor(1.0425)\n"
     ]
    }
   ],
   "source": [
    "import torch\n",
    "# M是样本数量，input_size是输入层大小\n",
    "# hidden_size是隐含层大小，output_size是输出层大小\n",
    "M, input_size, hidden_size, output_size = 64, 1000, 100, 10\n",
    "\n",
    "# 生成随机数当作样本\n",
    "x = torch.randn(M, input_size) #size(64, 1000)\n",
    "y = torch.randn(M, output_size) #size(64, 10)\n",
    "\n",
    "# 参数初始化\n",
    "def init_parameters():\n",
    "    w1 = torch.randn(input_size, hidden_size)\n",
    "    w2 = torch.randn(hidden_size, output_size)\n",
    "    b1 = torch.randn(1, hidden_size)\n",
    "    b2 = torch.randn(1, output_size)\n",
    "    return {\"w1\": w1, \"w2\":w2, \"b1\": b1, \"b2\": b2}\n",
    "\n",
    "# 定义模型\n",
    "def model(x, parameters):\n",
    "    Z1 = x.mm(parameters[\"w1\"]) + parameters[\"b1\"] # 线性层\n",
    "    A1 = Z1.clamp(min=0) # relu激活函数\n",
    "    Z2 = A1.mm(parameters[\"w2\"]) + parameters[\"b2\"] #线性层\n",
    "    # 为了方便反向求导，我们会把当前求得的结果保存在一个cache中\n",
    "    cache = {\"Z1\": Z1, \"A1\": A1}\n",
    "    return Z2, cache\n",
    "\n",
    "# 计算损失\n",
    "def loss_fn(y_pred, y):\n",
    "    loss = (y_pred - y).pow(2).sum() # 我们这里直接使用 MSE(均方误差) 作为损失函数\n",
    "    return loss\n",
    "\n",
    "# 反向传播，求出梯度\n",
    "def backpropogation(x, y, y_pred, cache, parameters):\n",
    "    m = y.size()[0] # m个样本\n",
    "    # 以下是反向求导的过程：\n",
    "    d_y_pred = 1/m * (y_pred - y)\n",
    "    d_w2 = 1/m * cache[\"A1\"].t().mm(d_y_pred)\n",
    "    d_b2 = 1/m * torch.sum(d_y_pred, 0, keepdim=True)\n",
    "    d_A1 = d_y_pred.mm(parameters[\"w2\"].t())\n",
    "    # 对 relu 函数求导: start\n",
    "    d_Z1 = d_A1.clone()\n",
    "    d_Z1[cache[\"Z1\"] < 0] = 0\n",
    "    # 对 relu 函数求导: end\n",
    "    d_w1 = 1/m * x.t().mm(d_Z1)\n",
    "    d_b1 = 1/m * torch.sum(d_Z1, 0, keepdim=True)\n",
    "    grads = {\n",
    "        \"d_w1\": d_w1,\n",
    "        \"d_b1\": d_b1,\n",
    "        \"d_w2\": d_w2,\n",
    "        \"d_b2\": d_b2\n",
    "    }\n",
    "    return grads\n",
    "\n",
    "# 更新参数\n",
    "def update(lr, parameters, grads):\n",
    "    parameters[\"w1\"] -= lr * grads[\"d_w1\"]\n",
    "    parameters[\"w2\"] -= lr * grads[\"d_w2\"]\n",
    "    parameters[\"b1\"] -= lr * grads[\"d_b1\"]\n",
    "    parameters[\"b2\"] -= lr * grads[\"d_b2\"]\n",
    "    return parameters\n",
    "\n",
    "## 设置超参数 ##\n",
    "\n",
    "learning_rate = 1e-2\n",
    "EPOCH = 400\n",
    "\n",
    "# 参数初始化\n",
    "parameters = init_parameters()\n",
    "\n",
    "## 开始训练 ##\n",
    "for t in range(EPOCH):\n",
    "    # 向前传播\n",
    "    y_pred, cache = model(x, parameters)\n",
    "\n",
    "    # 计算损失\n",
    "    loss = loss_fn(y_pred, y)\n",
    "    if (t+1) % 50 == 0:\n",
    "        print(loss)\n",
    "    # 反向传播\n",
    "    grads = backpropogation(x, y, y_pred, cache, parameters)\n",
    "    # 更新参数\n",
    "    parameters = update(learning_rate, parameters, grads)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "使用nn和optim来构建多层感知机"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor(33.7942, grad_fn=<MseLossBackward>)\n",
      "tensor(2.2907, grad_fn=<MseLossBackward>)\n",
      "tensor(0.2463, grad_fn=<MseLossBackward>)\n",
      "tensor(0.0388, grad_fn=<MseLossBackward>)\n",
      "tensor(0.0079, grad_fn=<MseLossBackward>)\n",
      "tensor(0.0019, grad_fn=<MseLossBackward>)\n"
     ]
    }
   ],
   "source": [
    "import torch\n",
    "from torch.autograd import Variable\n",
    "import torch.nn as nn\n",
    "\n",
    "# M是样本数量，input_size是输入层大小\n",
    "# hidden_size是隐含层大小，output_size是输出层大小\n",
    "M, input_size, hidden_size, output_size = 64, 1000, 100, 10\n",
    "\n",
    "# 生成随机数当作样本，同时用Variable 来包装这些数据，设置 requires_grad=False 表示在方向传播的时候，\n",
    "# 我们不需要求这几个 Variable 的导数\n",
    "x = Variable(torch.randn(M, input_size))\n",
    "y = Variable(torch.randn(M, output_size))\n",
    "\n",
    "# 使用 nn 包的 Sequential 来快速构建模型，Sequential可以看成一个组件的容器。\n",
    "# 它涵盖神经网络中的很多层，并将这些层组合在一起构成一个模型.\n",
    "# 之后，我们输入的数据会按照这个Sequential的流程进行数据的传输，最后一层就是输出层。\n",
    "# 默认会帮我们进行参数初始化\n",
    "model = nn.Sequential(\n",
    "    nn.Linear(input_size, hidden_size),\n",
    "    nn.ReLU(),\n",
    "    nn.Linear(hidden_size, output_size),\n",
    ")\n",
    "\n",
    "# 定义损失函数\n",
    "loss_fn = nn.MSELoss(reduction='sum')\n",
    "\n",
    "## 设置超参数 ##\n",
    "learning_rate = 1e-4\n",
    "EPOCH = 300\n",
    "\n",
    "# 使用optim包来定义优化算法，可以自动的帮我们对模型的参数进行梯度更新。这里我们使用的是随机梯度下降法。\n",
    "# 第一个传入的参数是告诉优化器，我们需要进行梯度更新的Variable 是哪些，\n",
    "# 第二个参数就是学习速率了。\n",
    "optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)\n",
    "\n",
    "## 开始训练 ##\n",
    "for t in range(EPOCH):\n",
    "    # 向前传播\n",
    "    y_pred= model(x)\n",
    "    # 计算损失\n",
    "    loss = loss_fn(y_pred, y)\n",
    "    # 显示损失\n",
    "    if (t+1) % 50 == 0:\n",
    "        print(loss)\n",
    "    # 在我们进行梯度更新之前，先使用optimier对象提供的清除已经积累的梯度。\n",
    "    optimizer.zero_grad()\n",
    "    # 计算梯度\n",
    "    loss.backward()\n",
    "    # 更新梯度\n",
    "    optimizer.step()"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}